1 Import Libraries

List of used packages: knitr dplyr ggplot2 broom reshape2 janitor plm pwt9 quarto renv shiny targets testthat tidyverse tibble usethis rio lubridate purrr Hmisc plotly hrbrthemes xts seasonal tsbox forecast tseries plotly ggridges shades urca

colorize <- function(x, color) {
  if (knitr::is_latex_output()) {
    sprintf("\\textcolor{%s}{%s}", color, x)
  } else if (knitr::is_html_output()) {
    sprintf("<span style='color: %s;'>%s</span>", color,
      x)
  } else x
}
#webshot::install_phantomjs()

2 Introduction

2.1 Describe Dataset

The dataset used for this project is Daily Delhi Climate, which consists of the following columns: 1. date: Date of format YYYY-MM-DD starting from “2013-01-01” and ending in “2017-01-01”.
2. meantemp: Mean temperature averaged out from multiple 3 hour intervals in a day.
3. humidity: Humidity value for the day (units are grams of water vapor per cubic meter volume of air).
4. wind_speed: Wind speed measured in kmph.
5. mean_pressure: Pressure reading of weather (measure in atm)

Q1. How can I find out if analyzing meantemp indvidiually (univariate) suffices, or if including other columns and perform a multivariate analysis would worth the effort to find more meaninful patterns and forecasts?

2.2 Goal and Procedure

The goal of this project is to analyze and forecast the mean temperature Delhi, which is recorded in the meantemp column. For this, after importing the dataset, outliers are removed in Preprocessing Section section. Then meantemp column is assigned to a time series object in Construct Time Series section for further processing, analysis, and forecast. After detecting seasonalities using plots, the time series is seasonally adjusted using X13-ARIMA-SEATS. Then, remaining trend is removed in detrend.

Before forecasting the time series, I check for stationarity of time series, as stationarity is an assumption in ARIMA model. For this purpose, unit root tests are applied in Stationarity section.

Finally, I used ARIMA model to forecast the time series in Forecast Time Series section.

df_train <- read_csv("data/DailyDelhiClimateTrain.csv")
## Rows: 1462 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl  (4): meantemp, humidity, wind_speed, meanpressure
## date (1): date
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(df_train)
##       date               meantemp        humidity        wind_speed    
##  Min.   :2013-01-01   Min.   : 6.00   Min.   : 13.43   Min.   : 0.000  
##  1st Qu.:2014-01-01   1st Qu.:18.86   1st Qu.: 50.38   1st Qu.: 3.475  
##  Median :2015-01-01   Median :27.71   Median : 62.62   Median : 6.222  
##  Mean   :2015-01-01   Mean   :25.50   Mean   : 60.77   Mean   : 6.802  
##  3rd Qu.:2016-01-01   3rd Qu.:31.31   3rd Qu.: 72.22   3rd Qu.: 9.238  
##  Max.   :2017-01-01   Max.   :38.71   Max.   :100.00   Max.   :42.220  
##   meanpressure     
##  Min.   :  -3.042  
##  1st Qu.:1001.580  
##  Median :1008.563  
##  Mean   :1011.105  
##  3rd Qu.:1014.945  
##  Max.   :7679.333
df_train |> describe()
## df_train 
## 
##  5  Variables      1462  Observations
## --------------------------------------------------------------------------------
## date 
##          n    missing   distinct       Info       Mean        Gmd        .05 
##       1462          0       1462          1 2015-01-01      487.7 2013-03-15 
##        .10        .25        .50        .75        .90        .95 
## 2013-05-27 2014-01-01 2015-01-01 2016-01-01 2016-08-07 2016-10-19 
## 
## lowest : 2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013-01-05
## highest: 2016-12-28 2016-12-29 2016-12-30 2016-12-31 2017-01-01
## --------------------------------------------------------------------------------
## meantemp 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1462        0      617        1     25.5    8.331    12.51    14.62 
##      .25      .50      .75      .90      .95 
##    18.86    27.71    31.31    33.71    35.43 
## 
## lowest :  6.000000  7.000000  7.166667  7.400000  8.666667
## highest: 38.200000 38.272727 38.428571 38.500000 38.714286
## --------------------------------------------------------------------------------
## humidity 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1462        0      897        1    60.77    18.96    29.00    36.21 
##      .25      .50      .75      .90      .95 
##    50.38    62.62    72.22    81.85    86.99 
## 
## lowest :  13.42857  15.85714  18.46667  19.00000  19.50000
## highest:  96.12500  96.62500  96.85714  98.00000 100.00000
## --------------------------------------------------------------------------------
## wind_speed 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1462        0      736        1    6.802    4.878    0.925    1.625 
##      .25      .50      .75      .90      .95 
##    3.475    6.222    9.238   12.608   14.813 
## 
## lowest :  0.0000000  0.4625000  0.4750000  0.5285714  0.6166667
## highest: 27.7750000 30.6857143 33.3250000 34.4875000 42.2200000
## --------------------------------------------------------------------------------
## meanpressure 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     1462        0      626        1     1011    22.92    996.8    998.1 
##      .25      .50      .75      .90      .95 
##   1001.6   1008.6   1014.9   1017.8   1019.3 
## 
## lowest :   -3.041667   12.045455  310.437500  633.900000  938.066667
## highest: 1022.125000 1023.000000 1350.296296 1352.615385 7679.333333
##                                                     
## Value          0   300   600   900  1000  1400  7700
## Frequency      2     1     1     2  1453     2     1
## Proportion 0.001 0.001 0.001 0.001 0.994 0.001 0.001
## 
## For the frequency table, variable is rounded to the nearest 100
## --------------------------------------------------------------------------------

3 Visualize Data

Below we can see interactive plot of the time series.

p <- df_train |>
  ggplot( aes(x=date, y=meantemp)) +
    geom_area(fill="#69b3a2", alpha=0.5) +
    geom_line(color="#69b3a2") +
    ylab("bitcoin price ($)") +
    theme_ipsum()

# Turn it interactive with ggplotly
p <- ggplotly(p)
#p
p

4 Preprocessing

We can detect an outlier at the last observation (last row of dataframe). It causes an abrupt decrease in value of temperature. This would lead to problems in further analysis I will proceed and also when I later apply functions on the time series. Therefore, I replace the last observation’s value with its previous one.

Q2. Is imputing with the last observation the right approach here? Or is preferrable that I use median of last n (rows) for instance? Maybe it won’t matter, as when I aggregate data (per week, month, week, etc.), I remove the last observation.

previous_value <- df_train$meantemp[df_train$date == as.Date('2016-12-31')]

df_train$meantemp[df_train$date == as.Date('2017-01-01')]<- previous_value 
#df_train <- head(df_train, -1)
head(df_train)
## # A tibble: 6 × 5
##   date       meantemp humidity wind_speed meanpressure
##   <date>        <dbl>    <dbl>      <dbl>        <dbl>
## 1 2013-01-01    10        84.5       0           1016.
## 2 2013-01-02     7.4      92         2.98        1018.
## 3 2013-01-03     7.17     87         4.63        1019.
## 4 2013-01-04     8.67     71.3       1.23        1017.
## 5 2013-01-05     6        86.8       3.7         1016.
## 6 2013-01-06     7        82.8       1.48        1018
tail(df_train)
## # A tibble: 6 × 5
##   date       meantemp humidity wind_speed meanpressure
##   <date>        <dbl>    <dbl>      <dbl>        <dbl>
## 1 2016-12-27     16.8     67.6       8.34        1017.
## 2 2016-12-28     17.2     68.0       3.55        1016.
## 3 2016-12-29     15.2     87.9       6           1017.
## 4 2016-12-30     14.1     89.7       6.27        1018.
## 5 2016-12-31     15.1     87         7.32        1016.
## 6 2017-01-01     15.1    100         0           1016
p <- df_train |>
  ggplot( aes(x=date, y=meantemp)) +
    geom_area(fill="#69b3a2", alpha=0.5) +
    geom_line(color="#69b3a2") +
    ylab("bitcoin price ($)") +
    theme_ipsum()

# Turn it interactive with ggplotly
p <- ggplotly(p)
#p
p

Find if there is any missing dates

date_range <- seq(min(df_train$date), max(df_train$date), by = 1) 
date_range[!date_range %in% df_train$date] 
## Date of length 0

5 Time Series Analysis

5.1 Construct Time Series

Now we use the meantemp column to create our time series data. I assigned the time series to xts objects. But since many functions later require ts object, each time I define an xts, I also convert it to ts object using tsbox::ts_ts

min(df_train$date)
## [1] "2013-01-01"
max(df_train$date)
## [1] "2017-01-01"
#ts_train <- zoo(df_train$meantemp, df_train$date)

xts_train <- xts(df_train$meantemp, order.by=df_train$date, "%Y-%m-%d")
class(xts_train)
## [1] "xts" "zoo"
head(xts_train)
##                 [,1]
## 2013-01-01 10.000000
## 2013-01-02  7.400000
## 2013-01-03  7.166667
## 2013-01-04  8.666667
## 2013-01-05  6.000000
## 2013-01-06  7.000000
tail(xts_train)
##                [,1]
## 2016-12-27 16.85000
## 2016-12-28 17.21739
## 2016-12-29 15.23810
## 2016-12-30 14.09524
## 2016-12-31 15.05263
## 2017-01-01 15.05263
# convert xts to ts

## Create a daily Date object for ts
#inds <- seq(as.Date("2013-01-01"), as.Date("2017-01-01"), by = "day")

#set.seed(25)
#ts_train <- ts(df_train$meantemp,     # random data
#           start = c(2013, as.numeric(format(inds[1], "%j"))),
#           frequency = 365)


#ts_train <- ts(df_train$meantemp, start = decimal_date(ymd("2013-01-01")), frequency = 365.25 / 7)

ts_train <-ts_ts(xts_train)
head(ts_train)
## Time Series:
## Start = 2013 
## End = 2013.01368953503 
## Frequency = 365.2425 
## [1] 10.000000  7.400000  7.166667  8.666667  6.000000  7.000000
tail(ts_train)
## Time Series:
## Start = 2016.98639260218 
## End = 2017.00008213721 
## Frequency = 365.2425 
## [1] 16.85000 17.21739 15.23810 14.09524 15.05263 15.05263

5.2 Seasonality

From the initial plot I judge that there is yearly seasonality. For more delicate observation to find if there is more granular periods of seasonality, I use seasonality plots. Before that, I aggregate data weekly, monthly, and quarterly.

5.2.1 Seasonality Plots

# Weekly mean temperature
xts_week_train <- apply.weekly(xts_train,sum)
ts_week_train <-ts_ts(xts_week_train)

# Monthly mean temperature
xts_mon_train <- aggregate(xts_train, by=as.yearmon, FUN=sum)
ts_mon_train <-ts_ts(xts_mon_train)

# Quarterly mean temperature
xts_quar_train <- aggregate(xts_train, as.yearqtr, FUN=sum)
ts_quar_train <-ts_ts(xts_quar_train)


# Yearly mean temperate
as.year <- function(x) as.integer(as.yearmon(x))
xts_year_train <- aggregate(xts_train, by=as.year, FUN=sum)
#ts_year_train <-ts_ts(xts_year_train)
#xts_year_train[1]

The year 2017 has only one observation, so I remove it from all the aggregated datasets. I couldn’t do it before aggregating, otherwise I would have confronted the error Error: series has no regular pattern.

xts_week_train <- head(xts_week_train, -1)
xts_mon_train <- head(xts_mon_train, -1)
xts_quar_train <- head(xts_quar_train, -1)

ts_week_train <- head(ts_week_train, -1)
ts_mon_train <- head(ts_mon_train, -1)
ts_quar_train <- head(ts_quar_train, -1)
#options(repr.plot.width = 7, repr.plot.height =20)
forecast::ggseasonplot(ts_mon_train, year.labels=TRUE, year.labels.left=TRUE, labelgap = 0.1) +
  ylab("degree") +
  ggtitle("Seasonal plot: Monthly Mean Temperature")

forecast::ggseasonplot(ts_mon_train, year.labels=TRUE, year.labels.left=TRUE, labelgap = 0.1, polar=TRUE) +
  ylab("degree") +
  ggtitle("Polar Seasonal plot: Monthly Mean Temperature")

#options(repr.plot.width = 7, repr.plot.height =20)
forecast::ggseasonplot(ts_quar_train, year.labels=TRUE, year.labels.left=TRUE, labelgap = 0.1) +
  ylab("degree") +
  ggtitle("Seasonal plot: Quarterly Mean Temperature")

forecast::ggseasonplot(ts_quar_train, year.labels=TRUE, year.labels.left=TRUE, labelgap = 0.1, polar=TRUE) +
  ylab("degree") +
  ggtitle("Polar Seasonal plot: Quarterly Mean Temperature")

### Deseasonalize {#deseas}

If I need to remove different periods of seasonality together, I would need to use the forecast:msts function. For instance in below I remove weekly and yearly seasonality together.

des_ts_train <- msts(xts_train,seasonal.periods = c(7,365))
#head(des_xts_train)
#library(tsbox)
#ts_train <-ts_ts(xts_train)
#ts_train

class(des_ts_train)
## [1] "msts" "ts"

However, since its output had an unfamiliar and weird shape to me, and also since I wasn’t sure it uses the state-of-the-art X13 decomposition, I incorporated the X-13ARIMA-SEATS using seasonal:seas function. However, it has some limitations, as stated in the package’s reference manua. For instance, the number of observations must not exceed 780. Nor should maximum seasonal period exceed 12. That is why I couldn’t use original data ts_train and also the weekly aggregated data ts_week_train, as I would confront the error Seasonal period too large. The only possible aggregated data with highest frequency possible was monthly aggregated, ts_mon_train. However, I am concerned that I would lose significant pattern and information with this amount of aggregation.

Q3. If you could kindly share your viewpoint here, it would be very helpful for me to ensure how to proceed.

length(xts_train)
## [1] 1462
length(ts_train)
## [1] 1462
length(xts_train)
## [1] 1462
nowXTS <-ts_xts(ts_train)
length(nowXTS)
## [1] 1462
length(ts_week_train)
## [1] 208
plot(ts_week_train)

length(ts_week_train)
## [1] 208
plot(ts_train)

length(ts_train)
## [1] 1462
plot(ts_mon_train)

length(ts_mon_train)
## [1] 48
m <- seas(ts_mon_train)
ts_train_adj <- final(m)
#ts_train_adj
length(ts_train_adj)
## [1] 48
m <- seas(ts_mon_train)
ts_train_adj <- final(m)
#ts_train_adj
length(ts_train_adj)
## [1] 48
plot(ts_train_adj)

Plot original data along with trend and seasonally adjusted data

#ts_train
#series(m, "forecast.forecasts")
#out(m)
#seasadj(m)
autoplot(ts_mon_train, series="Original Data") +
autolayer(trendcycle(m), series="Trend") +
autolayer(seasadj(m), series="Seasonally Adjusted") +
xlab("Year") + ylab("Mean Temperature") +
ggtitle("Mean Temperature Decomposed using X13") +
scale_colour_manual(values=c("gray","blue","red"),
           breaks=c("Original Data","Seasonally Adjusted","Trend"))

#ap < ggplotly(ap)

5.3 Detrend

In the seasonally adjusted time series ts_train_adj, I detected a trend, therefore I detrend it using differencing.

#ts_train_adj |> log() |> nsdiffs(alpha=0.01) -> ts_train_adj_det
ts_train_adj |> log() |> diff() -> ts_train_adj_det
plot(ts_train_adj_det)

#plot(d)

5.4 Correlation Plots

  1. Weekly aggregated of original time series
ggAcf(ts_week_train, lag=50)

pacf (ts_week_train, lag=50, pl = TRUE)

  1. Seasonally Adjusted
ggAcf(ts_train_adj, lag=10)

pacf (ts_train_adj, lag=10, pl = TRUE)

  1. Seasonally Adjusted and Detrended
ggAcf(ts_train_adj_det, lag=10)

pacf (ts_train_adj_det, lag=10, pl = TRUE)

5.5 Testing Stationarity: Unit Root Tests

5.5.1 ADF

  1. Original Time Series and its weekly adjusted
ts_train |> adf.test()
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts_train
## Dickey-Fuller = -1.9871, Lag order = 11, p-value = 0.5838
## alternative hypothesis: stationary
ts_week_train |> adf.test()
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts_week_train
## Dickey-Fuller = -3.8729, Lag order = 5, p-value = 0.01651
## alternative hypothesis: stationary
  1. Seasonally Adjusted
ts_train_adj |> adf.test() 
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts_train_adj
## Dickey-Fuller = -2.5897, Lag order = 3, p-value = 0.3386
## alternative hypothesis: stationary
  1. Seasonally Adjusted and Detrended
ts_train_adj_det |> adf.test() 
## Warning in adf.test(ts_train_adj_det): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts_train_adj_det
## Dickey-Fuller = -4.698, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

5.5.2 KPSS

  1. Original Time Series and also its weekly aggregated
ts_train |> ur.kpss() |> summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 7 lags. 
## 
## Value of test-statistic is: 0.5812 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
ts_week_train |> ur.kpss() |> summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.1618 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
  1. Seasonally Adjusted
ts_train_adj |> ur.kpss() |> summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.9974 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739
  1. Seasonally Adjusted and Detrended
ts_train_adj_det |> ur.kpss() |> summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.0595 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

6 Time Series Forecasting

6.1 AUTO-ARIMA

  1. Forecast original time series of meantemp but aggregated weekly, as the original data has very high frequency, which makes it unsuitable for ARMA. Only for this case, I set seasonal=TRUE, as in next cases I use data that I seasonally adjusted them already. Setting seasonal=TRUE makes the model more time-consuming.
forecast_ts_week_train = auto.arima(ts_week_train,
                            trace = TRUE, 
                            seasonal=TRUE,
                            stepwise=FALSE,
                            approximation=FALSE)
## Warning: The time series frequency has been rounded to support seasonal
## differencing.
## 
##  ARIMA(0,1,0)(0,1,0)[52]                    : 1348.451
##  ARIMA(0,1,0)(0,1,1)[52]                    : Inf
##  ARIMA(0,1,0)(1,1,0)[52]                    : 1317.269
##  ARIMA(0,1,0)(1,1,1)[52]                    : Inf
##  ARIMA(0,1,1)(0,1,0)[52]                    : 1304.937
##  ARIMA(0,1,1)(0,1,1)[52]                    : Inf
##  ARIMA(0,1,1)(1,1,0)[52]                    : 1276.491
##  ARIMA(0,1,1)(1,1,1)[52]                    : Inf
##  ARIMA(0,1,2)(0,1,0)[52]                    : 1298.502
##  ARIMA(0,1,2)(0,1,1)[52]                    : Inf
##  ARIMA(0,1,2)(1,1,0)[52]                    : 1271.798
##  ARIMA(0,1,2)(1,1,1)[52]                    : Inf
##  ARIMA(0,1,3)(0,1,0)[52]                    : 1300.128
##  ARIMA(0,1,3)(0,1,1)[52]                    : Inf
##  ARIMA(0,1,3)(1,1,0)[52]                    : 1273.906
##  ARIMA(0,1,3)(1,1,1)[52]                    : Inf
##  ARIMA(0,1,4)(0,1,0)[52]                    : 1302.264
##  ARIMA(0,1,4)(0,1,1)[52]                    : Inf
##  ARIMA(0,1,4)(1,1,0)[52]                    : 1275.931
##  ARIMA(0,1,5)(0,1,0)[52]                    : 1301.246
##  ARIMA(1,1,0)(0,1,0)[52]                    : 1333.282
##  ARIMA(1,1,0)(0,1,1)[52]                    : Inf
##  ARIMA(1,1,0)(1,1,0)[52]                    : 1303.962
##  ARIMA(1,1,0)(1,1,1)[52]                    : Inf
##  ARIMA(1,1,1)(0,1,0)[52]                    : 1298.238
##  ARIMA(1,1,1)(0,1,1)[52]                    : Inf
##  ARIMA(1,1,1)(1,1,0)[52.17869]                    : Inf
##  ARIMA(1,1,1)(1,1,1)[52.17869]                    : Inf
##  ARIMA(1,1,2)(0,1,0)[52]                    : 1300.221
##  ARIMA(1,1,2)(0,1,1)[52]                    : Inf
##  ARIMA(1,1,2)(1,1,0)[52.17869]                    : Inf
##  ARIMA(1,1,2)(1,1,1)[52.17869]                    : Inf
##  ARIMA(1,1,3)(0,1,0)[52]                    : 1302.264
##  ARIMA(1,1,3)(0,1,1)[52]                    : Inf
##  ARIMA(1,1,3)(1,1,0)[52.17869]                    : Inf
##  ARIMA(1,1,4)(0,1,0)[52]                    : 1304.398
##  ARIMA(2,1,0)(0,1,0)[52]                    : 1325.56
##  ARIMA(2,1,0)(0,1,1)[52]                    : Inf
##  ARIMA(2,1,0)(1,1,0)[52]                    : 1295.714
##  ARIMA(2,1,0)(1,1,1)[52]                    : Inf
##  ARIMA(2,1,1)(0,1,0)[52]                    : 1300.196
##  ARIMA(2,1,1)(0,1,1)[52]                    : Inf
##  ARIMA(2,1,1)(1,1,0)[52]                    : Inf
##  ARIMA(2,1,1)(1,1,1)[52]                    : Inf
##  ARIMA(2,1,2)(0,1,0)[52]                    : 1302.316
##  ARIMA(2,1,2)(0,1,1)[52]                    : Inf
##  ARIMA(2,1,2)(1,1,0)[52]                    : 1274.862
##  ARIMA(2,1,3)(0,1,0)[52]                    : 1299.815
##  ARIMA(3,1,0)(0,1,0)[52]                    : 1317.794
##  ARIMA(3,1,0)(0,1,1)[52]                    : Inf
##  ARIMA(3,1,0)(1,1,0)[52]                    : 1286.975
##  ARIMA(3,1,0)(1,1,1)[52]                    : Inf
##  ARIMA(3,1,1)(0,1,0)[52]                    : 1302.228
##  ARIMA(3,1,1)(0,1,1)[52]                    : Inf
##  ARIMA(3,1,1)(1,1,0)[52]                    : 1274.048
##  ARIMA(3,1,2)(0,1,0)[52]                    : 1303.9
##  ARIMA(4,1,0)(0,1,0)[52]                    : 1299.762
##  ARIMA(4,1,0)(0,1,1)[52]                    : Inf
##  ARIMA(4,1,0)(1,1,0)[52]                    : 1269.65
##  ARIMA(4,1,1)(0,1,0)[52]                    : 1301.905
##  ARIMA(5,1,0)(0,1,0)[52]                    : 1301.906
## 
## 
## 
##  Best model: ARIMA(4,1,0)(1,1,0)[52]
checkresiduals(forecast_ts_week_train)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(4,1,0)(1,1,0)[52]
## Q* = 33.201, df = 37, p-value = 0.6478
## 
## Model df: 5.   Total lags used: 42
autoplot(forecast(forecast_ts_week_train))

  1. Forecast deseasonalized time series:
forecast_ts_train_adj = auto.arima(ts_train_adj,
                            trace = TRUE, 
                            seasonal= FALSE,
                            stepwise=FALSE,
                            approximation=FALSE)
## 
##  ARIMA(0,1,0)                               : 441.3883
##  ARIMA(0,1,0)            with drift         : 443.1154
##  ARIMA(0,1,1)                               : 429.9577
##  ARIMA(0,1,1)            with drift         : 429.5371
##  ARIMA(0,1,2)                               : 432.2379
##  ARIMA(0,1,2)            with drift         : 431.8095
##  ARIMA(0,1,3)                               : 434.4112
##  ARIMA(0,1,3)            with drift         : 433.7996
##  ARIMA(0,1,4)                               : 434.6353
##  ARIMA(0,1,4)            with drift         : Inf
##  ARIMA(0,1,5)                               : Inf
##  ARIMA(0,1,5)            with drift         : Inf
##  ARIMA(1,1,0)                               : 432.8756
##  ARIMA(1,1,0)            with drift         : 434.1589
##  ARIMA(1,1,1)                               : 432.2366
##  ARIMA(1,1,1)            with drift         : 431.7635
##  ARIMA(1,1,2)                               : 434.6252
##  ARIMA(1,1,2)            with drift         : Inf
##  ARIMA(1,1,3)                               : 436.6632
##  ARIMA(1,1,3)            with drift         : Inf
##  ARIMA(1,1,4)                               : 436.8366
##  ARIMA(1,1,4)            with drift         : Inf
##  ARIMA(2,1,0)                               : 432.5313
##  ARIMA(2,1,0)            with drift         : 433.449
##  ARIMA(2,1,1)                               : 434.5063
##  ARIMA(2,1,1)            with drift         : 433.7124
##  ARIMA(2,1,2)                               : 436.9823
##  ARIMA(2,1,2)            with drift         : Inf
##  ARIMA(2,1,3)                               : Inf
##  ARIMA(2,1,3)            with drift         : Inf
##  ARIMA(3,1,0)                               : 434.8565
##  ARIMA(3,1,0)            with drift         : 435.9464
##  ARIMA(3,1,1)                               : 436.788
##  ARIMA(3,1,1)            with drift         : Inf
##  ARIMA(3,1,2)                               : Inf
##  ARIMA(3,1,2)            with drift         : Inf
##  ARIMA(4,1,0)                               : 436.5007
##  ARIMA(4,1,0)            with drift         : 437.448
##  ARIMA(4,1,1)                               : Inf
##  ARIMA(4,1,1)            with drift         : 436.8826
##  ARIMA(5,1,0)                               : 436.434
##  ARIMA(5,1,0)            with drift         : 436.606
## 
## 
## 
##  Best model: ARIMA(0,1,1)            with drift
checkresiduals(forecast_ts_train_adj)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1) with drift
## Q* = 7.4191, df = 9, p-value = 0.5936
## 
## Model df: 1.   Total lags used: 10
#forecast_ts_train_adj + ts_train_adj
autoplot(forecast(forecast_ts_train_adj ))

  1. Forecast deseasonalized and detrended time series:
forecast_ts_train_adj_det = auto.arima(ts_train_adj_det,
                            trace = TRUE, 
                            seasonal= FALSE,
                            stepwise=FALSE,
                            approximation=FALSE)
## 
##  ARIMA(0,0,0)            with zero mean     : -183.1965
##  ARIMA(0,0,0)            with non-zero mean : -181.4467
##  ARIMA(0,0,1)            with zero mean     : -195.2457
##  ARIMA(0,0,1)            with non-zero mean : -195.724
##  ARIMA(0,0,2)            with zero mean     : -192.9693
##  ARIMA(0,0,2)            with non-zero mean : -193.439
##  ARIMA(0,0,3)            with zero mean     : -190.7534
##  ARIMA(0,0,3)            with non-zero mean : -191.4021
##  ARIMA(0,0,4)            with zero mean     : -190.4085
##  ARIMA(0,0,4)            with non-zero mean : Inf
##  ARIMA(0,0,5)            with zero mean     : Inf
##  ARIMA(0,0,5)            with non-zero mean : Inf
##  ARIMA(1,0,0)            with zero mean     : -191.9675
##  ARIMA(1,0,0)            with non-zero mean : -190.6383
##  ARIMA(1,0,1)            with zero mean     : -192.971
##  ARIMA(1,0,1)            with non-zero mean : -193.4768
##  ARIMA(1,0,2)            with zero mean     : -190.5819
##  ARIMA(1,0,2)            with non-zero mean : Inf
##  ARIMA(1,0,3)            with zero mean     : -188.4672
##  ARIMA(1,0,3)            with non-zero mean : Inf
##  ARIMA(1,0,4)            with zero mean     : -188.2137
##  ARIMA(1,0,4)            with non-zero mean : Inf
##  ARIMA(2,0,0)            with zero mean     : -192.6467
##  ARIMA(2,0,0)            with non-zero mean : -191.6913
##  ARIMA(2,0,1)            with zero mean     : -190.668
##  ARIMA(2,0,1)            with non-zero mean : -191.4938
##  ARIMA(2,0,2)            with zero mean     : -188.1949
##  ARIMA(2,0,2)            with non-zero mean : Inf
##  ARIMA(2,0,3)            with zero mean     : Inf
##  ARIMA(2,0,3)            with non-zero mean : Inf
##  ARIMA(3,0,0)            with zero mean     : -190.3099
##  ARIMA(3,0,0)            with non-zero mean : -189.1897
##  ARIMA(3,0,1)            with zero mean     : -188.4346
##  ARIMA(3,0,1)            with non-zero mean : Inf
##  ARIMA(3,0,2)            with zero mean     : Inf
##  ARIMA(3,0,2)            with non-zero mean : Inf
##  ARIMA(4,0,0)            with zero mean     : -188.6455
##  ARIMA(4,0,0)            with non-zero mean : -187.6592
##  ARIMA(4,0,1)            with zero mean     : Inf
##  ARIMA(4,0,1)            with non-zero mean : -188.2978
##  ARIMA(5,0,0)            with zero mean     : -188.6625
##  ARIMA(5,0,0)            with non-zero mean : -188.4274
## 
## 
## 
##  Best model: ARIMA(0,0,1)            with non-zero mean
checkresiduals(forecast_ts_train_adj_det)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,0,1) with non-zero mean
## Q* = 7.1212, df = 8, p-value = 0.5236
## 
## Model df: 1.   Total lags used: 9
autoplot(forecast(forecast_ts_train_adj_det))

Q4. I did all unit root tests and ARIMA models on all the following datasets: original time series, deseasonalized time series, and combination of deaseonalized and detrended time series. Judging by the plots, the adjusted versions performed poorly when fed to the model compared to feeding the weekly aggregated data that is fed to AUTOARIMA but with autmatic seasonality modelling of seasonality. Could you please share your viewpoint regarding this?

6.2 VAR

ts_plot(ts(df_train$wind_speed))

p2 <- df_train |>
  ggplot( aes(x=date, y=wind_speed)) +
    geom_area(fill="#69b3a2", alpha=0.5) +
    geom_line(color="#69b3a2") +
    ylab("bitcoin price ($)") +
    theme_ipsum()

# Turn it interactive with ggplotly
p2 <- ggplotly(p2)
#p
p2
fig <- plot_ly(df_train, type = 'scatter', mode = 'lines')%>%
  add_trace(x = ~date, y = ~meantemp, name = 'MeanTemp')%>%
  add_trace(x = ~date, y = ~wind_speed, name = 'WindSpeed')%>%
  layout(title = 'custom tick labels',legend=list(title=list(text='variable')),
         xaxis = list(dtick = "M1", tickformat= "%b\n%Y"), width = 2000)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
options(warn = -1)
fig <- fig %>%
  layout(
         xaxis = list(zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff',  tickangle = 0),
         yaxis = list(zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff'),
         plot_bgcolor='#e5ecf6')


fig